A cluster problem as defined by nearest neighbours
نویسندگان
چکیده
Randomly generated points in IR are connected to their nearest neighbours (Euclidean distance). The resulting connected clusters of points are studied. This paper examines questions related to the collection of clusters formed and to the internal structure of a cluster. In particular, the one-dimensional structure is examined in detail.
منابع مشابه
Pseudo-Likelihood Inference Underestimates Model Uncertainty: Evidence from Bayesian Nearest Neighbours
When using the K-nearest neighbours (KNN) method, one often ignores the uncertainty in the choice of K. To account for such uncertainty, Bayesian KNN (BKNN) has been proposed and studied (Holmes and Adams 2002 Cucala et al. 2009). We present some evidence to show that the pseudo-likelihood approach for BKNN, even after being corrected by Cucala et al. (2009), still significantly underest...
متن کاملSNN: A Supervised Clustering Algorithm
In this paper, we present a new algorithm based on the nearest neighbours method, for discovering groups and identifying interesting distributions in the underlying data in the labelled databases. We introduces the theory of nearest neighbours sets in order to base the algorithm S-NN (Similar Nearest Neighbours). Traditional clustering algorithms are very sensitive to the user-defined parameter...
متن کاملEvolutionary Nearest Neighbour Classification Framework
Data classification attempts to assign a category or a class label to an unknown data object based on an available similar data set with class labels already assigned. K nearest neighbor (KNN) is a widely used classification technique in data mining. KNN assigns the majority class label of its closest neighbours to an unknown object, when classifying an unknown object. The computational efficie...
متن کاملKDDClus: A Simple Method for Multi-Density Clustering
Automated clustering of multi-density spatial data is developed. The algorithm KDDClus serves as an enhancement to the wellknown DBSCAN. Averaging the distances of a pattern to all k of its nearest neighbours allows a smoothing out of noise while automatically detecting the “knees” from the k-distance plot. The use of the KD-tree data structure enables efficient computation of the k-nearest nei...
متن کاملEnsembles of Nearest Neighbours for Cancer Classification Using Gene Expression Data
It is known that an ensemble of classifiers can outperform a single best classifier if classifiers in the ensemble are sufficiently diverse (i.e., their errors are as much uncorrelated as possible) and accurate. We study ensembles of nearest neighbours for cancer classification based on gene expression data. Such ensembles have been rarely used, because the traditional ensemble methods such as ...
متن کامل